tantivy-sstable 0.2.0

sstables for tantivy
Documentation

SSTable

The tantivy-sstable crate is yet another sstable crate.

It has been designed to be used in quickwit:

  • as an alternative to the default tantivy fst dictionary.
  • as a way to store the column index for dynamic fast fields.

The benefit compared to the fst crate is locality. Searching a key in the fst crate requires downloading the entire dictionary.

Once the sstable index is downloaded, running a get in the sstable crate only requires a single fetch.

Right now, the block index and the default block size have been thought for quickwit, and the performance of a get is very bad.

Sorted strings?

SSTable stands for Sorted String Table. Strings have to be insert in sorted order.

That sorted order is used in different ways:

  • it makes gets and streaming ranges of keys possible.
  • it allows incremental encoding of the keys
  • the front compression is leveraged to optimize the intersection with an automaton

On disk format

Overview of the SSTable format. Unless noted otherwise, numbers are little-endian.

SSTable

+-------+-------+-----+--------+
| Block | Block | ... | Footer |
+-------+-------+-----+--------+
|----( # of blocks)---|
  • Block(SSTBlock): list of independent block, terminated by a single empty block.
  • Footer(SSTFooter)

SSTBlock

+----------+----------+--------+-------+-------+-----+
| BlockLen | Compress | Values | Delta | Delta | ... |
+----------+----------+--------+-------+-------+-----+
                      |        |----( # of deltas)---|
                      |------(maybe compressed)------|
  • BlockLen(u32): length of the block, including the compress byte.
  • Compress(u8): indicate whether block is compressed. 0 if not compressed, 1 if compressed.
  • Values: an application defined format storing a sequence of value, capable of determining it own length
  • Delta

Delta

+---------+--------+
| KeepAdd | Suffix |
+---------+--------+
  • KeepAdd
  • Suffix: KeepAdd.add bytes of key suffix

KeepAdd

KeepAdd can be represented in two different representation, a very compact 1byte one which is enough for most usage, and a longer variable-len one when required

When keep < 16 and add < 16

+-----+------+
| Add | Keep |
+-----+------+
  • Add(u4): number of bytes to push
  • Keep(u4): number of bytes to pop

Otherwise:

+------+------+-----+
| 0x01 | Keep | Add |
+------+------+-----+
  • Add(VInt): number of bytes to push
  • Keep(VInt): number of bytes to pop

Note: as the SSTable does not support redundant keys, there is no ambiguity between both representation. Add is always guaranteed to be non-zero, except for the very first key of an SSTable, where Keep is guaranteed to be zero.

SSTFooter

+-------+-------+-----+-------------+---------+---------+
| Block | Block | ... | IndexOffset | NumTerm | Version |
+-------+-------+-----+-------------+---------+---------+
|----( # of blocks)---|
  • Block(SSTBlock): uses IndexValue for its Values format
  • IndexOffset(u64): Offset to the start of the SSTFooter
  • NumTerm(u64): number of terms in the sstable
  • Version(u32): Currently equal to 2

IndexValue

+------------+----------+-------+-------+-----+
| EntryCount | StartPos | Entry | Entry | ... |
+------------+----------+-------+-------+-----+
                        |---( # of entries)---|
  • EntryCount(VInt): number of entries
  • StartPos(VInt): the start pos of the first (data) block referenced by this (index) block
  • Entry (IndexEntry)

Entry

+----------+--------------+
| BlockLen | FirstOrdinal |
+----------+--------------+
  • BlockLen(VInt): length of the block
  • FirstOrdinal(VInt): ordinal of the first element in the given block